Counterfactual Mix-Up for Visual Question Answering
نویسندگان
چکیده
Counterfactuals have been shown to be a powerful method in Visual Question Answering the alleviation of Answering’s unimodal bias. However, existing counterfactual methods tend generate samples that are not diverse or require auxiliary models synthesize additional data. In this regard, we propose more and simple sample synthesis called Counterfactual Mix-Up (CoMiU), which generates image features questions through batch-wise swapping local object- word-level. This efficiently facilitates generation abundant samples, help improve robustness models. Moreover, with creation introduce two robust stable contrastive loss functions, namely Batch-Contrastive Answer-Contrastive loss. We test our on various challenging testing setups show advantages proposed compared current state-of-the-art methods.
منابع مشابه
Investigating Embedded Question Reuse in Question Answering
The investigation presented in this paper is a novel method in question answering (QA) that enables a QA system to gain performance through reuse of information in the answer to one question to answer another related question. Our analysis shows that a pair of question in a general open domain QA can have embedding relation through their mentions of noun phrase expressions. We present methods f...
متن کاملRevisiting Visual Question Answering Baselines
Visual question answering (VQA) is an interesting learning setting for evaluating the abilities and shortcomings of current systems for image understanding. Many of the recently proposed VQA systems include attention or memory mechanisms designed to support “reasoning”. For multiple-choice VQA, nearly all of these systems train a multi-class classifier on image and question features to predict ...
متن کاملiVQA: Inverse Visual Question Answering
In recent years, visual question answering (VQA) has become topical as a long-term goal to drive computer vision and multi-disciplinary AI research. The premise of VQA’s significance, is that both the image and textual question need to be well understood and mutually grounded in order to infer the correct answer. However, current VQA models perhaps ‘understand’ less than initially hoped, and in...
متن کاملDifferential Attention for Visual Question Answering
In this paper we aim to answer questions based on images when provided with a dataset of question-answer pairs for a number of images during training. A number of methods have focused on solving this problem by using image based attention. This is done by focusing on a specific part of the image while answering the question. Humans also do so when solving this problem. However, the regions that...
متن کاملInterpretable Counting for Visual Question Answering
Questions that require counting a variety of objects in images remain a major challenge in visual question answering (VQA). The most common approaches to VQA involve either classifying answers based on fixed length representations of both the image and question or summing fractional counts estimated from each section of the image. In contrast, we treat counting as a sequential decision process ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE Access
سال: 2023
ISSN: ['2169-3536']
DOI: https://doi.org/10.1109/access.2023.3303891